Semi-supervised Rail Defect Detection from Imbalanced Image Data
نویسندگان
چکیده
Rail defect detection by video cameras has recently gained much attention in both academia and industry. Rail image data has two properties. It is highly imbalanced towards the non-defective class and it has a large number of unlabeled data samples available for semisupervised learning techniques. In this paper we investigate if positive defective candidates selected from the unlabeled data can help improve the balance between the two classes and gain performance on detecting a specific type of defects called Squats. We compare data sampling techniques as well and conclude that the semi-supervised techniques are a reasonable alternative for improving performance on applications such as rail track Squat detection from image data.
منابع مشابه
Sampling Imbalance Dataset for Software Defect Prediction Using Hybrid Neuro-fuzzy Systems with Naive Bayes Classifier
Original scientific paper Software defect prediction (SDP) is a process with difficult tasks in the case of software projects. The SDP process is useful for the identification and location of defects from the modules. This task will tend to become more costly with the addition of complex testing and evaluation mechanisms, when the software project modules size increases. Further measurement of ...
متن کاملSemi-Supervised Self-training Approaches for Imbalanced Splice Site Datasets
Machine Learning algorithms produce accurate classifiers when trained on large, balanced datasets. However, it is generally expensive to acquire labeled data, while unlabeled data is available in much larger amounts. A cost-effective alternative is to use Semi-Supervised Learning, which uses unlabeled data to improve supervised classifiers. Furthermore, for many practical problems, data often e...
متن کاملEmpowering Imbalanced Data in Supervised Learning: A Semi-supervised Learning Approach
We present a framework to address the imbalanced data problem using semi-supervised learning. Specifically, from a supervised problem, we create a semi-supervised problem and then use a semi-supervised learning method to identify the most relevant instances to establish a welldefined training set. We present extensive experimental results, which demonstrate that the proposed framework significa...
متن کاملSemi-Supervised Learning for Imbalanced Sentiment Classification
Various semi-supervised learning methods have been proposed recently to solve the long-standing shortage problem of manually labeled data in sentiment classification. However, most existing studies assume the balance between negative and positive samples in both the labeled and unlabeled data, which may not be true in reality. In this paper, we investigate a more common case of semi-supervised ...
متن کاملExtracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کامل